The goal of this exploration is to develop a hierarchical Bayesian model to estimate the true case fatality rate (CFR) for each county. In particular, we will start by taking advantage of the grouping of counties within states. The result of the model will be a “denoised” estimate of the CFR for each county in the country.

The initial motivation for this exploration is to use the distribution of the denoised CFR across counties to estimate to select the shape and scale for a beta prior distribution that will enable the analytic calculation of a denoised posterior CFR for an arbitrary county taking advantage of the conjucacy between a beta prior and a binomial likelihood.

Exploratory plots

Numerical summary of case fatality rates

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00571 0.01806 0.02552 0.03499 0.50000

Case fatality rates by number of

The most extreme CFRs come from counties with small numbers of cases

Median and IQR of county CFR by state

The appears to be meaningful clustering of CFR within states, which suggests that a model with a random effect for state is appropriate.

Modelling

We fit a binomial model to estimate an adjusted (denoised) CFR for each county by shrinking the county CFR towards the state CFR and shrinking the state CFR towards the national CFR. The prior \(N(0,1.6)\) on the intercept is chosen because this prior on the logit scale is approximately uniform over [0,1] when transformed to the probability scale.

I should note that with these priors, this simple model probably does not require STAN to fit (we could use e.g. glmer). However the stan machinery will be needed if we make the model more complex.

We could in theory obtain more precise estimates by placing a more informative prior on the national CFR, however the gains in precision would likely be small given that there is ample data to estimate the national CFR.

Model with intercept and state and county random effect

Priors

##            prior     class      coef group resp dpar nlpar bound
## 1 normal(0, 1.6) Intercept                                      
## 2   normal(0, 1)        sd                                      
## 3                       sd            fips                      
## 4                       sd Intercept  fips                      
## 5                       sd           state                      
## 6                       sd Intercept state

Fit summary

The variance estimates for the state random effect and the county random effect are roughly the same and are relatively large, suggesting that there is meaningful variation in CFR both within and between states.

##  Family: binomial 
##   Links: mu = logit 
## Formula: deaths | trials(cases) ~ (1 | state) + (1 | fips) 
##    Data: dat (Number of observations: 3116) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~fips (Number of levels: 3116) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.61      0.01     0.58     0.63 1.00     1050     1884
## 
## ~state (Number of levels: 51) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.57      0.07     0.45     0.71 1.00      611     1133
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -3.79      0.08    -3.96    -3.64 1.01      422      638
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

National CFR

The adjusted national CFR is essentially the same as the unadjusted national CFR, because there is ample data to estimate this CFR.

cases deaths CFR CFR_adj
5066992 175196 0.03458 0.03458

State CFR

The adjusted state CFRs are very close to the unadjusted state CFRs, because there is ample data to estimate these as well. The regularization on the state random effect may become more important if we do something more complex, such as incorporating a time trend that interacts with state.

County CFR

The adjustment on the county-level CFR is much more meaningful. The many points substantially below the diagonal on this plot indicate counties where the unadjusted CFR is very large but the adjusted (denoised) CFR is much more moderate.

Case fatality rates (adjusted and unadjusted) by number of cases

The relationship between CFR and number of cases looks different after adjustment, in that CFRs for counties with few cases have been shrunken to more moderate values. Interestingly, there is a slight positive relationship (approximately linear on the log scale) between CFR and cases, for both adjusted and unadjusted CFR.

Distribution of CFR by state

Hover over densities to see which state they represent.

Fit beta distribution to overall state-level CFR for use as prior in beta-binomial conjugate adjustment of county CFR at future times

First fit beta distribution to distribution of county-level CFR nationally.

Parameter estimates

shape1 shape2
2.747 101.7

Empirical density versus fitted distribution

Fit a separate distribution per state

state shape1 shape2 n_counties deaths cases mean_CFR mean_CFR_fitted
DE 51.15 1214 3 604 15449 0.04059 0.04043
HI 6.358 381 4 48 3733 0.01226 0.01641
RI 3.515 58.63 5 1008 17860 0.05637 0.05655
CT 5.07 57.45 8 4460 50440 0.08105 0.08109
NH 7.452 183.5 10 429 6861 0.03262 0.03902
MA 8.229 95.51 14 8944 121381 0.07562 0.07932
VT 13.93 469.4 14 58 1464 0.01987 0.02881
AZ 11.44 317.3 15 4771 188736 0.03496 0.0348
ME 3.302 81.1 16 131 4050 0.03613 0.03913
NV 13.62 617.5 16 1200 57520 0.0172 0.02157
NJ 14.92 149.3 21 15946 184815 0.09099 0.09083
WY 8.318 587.4 23 37 3069 0.02093 0.01396
MD 5.432 132.6 24 3684 96843 0.03825 0.03936
AK 32.74 2791 26 32 3819 0.01057 0.01159
UT 4.703 512.3 28 390 44733 0.007817 0.009097
NM 4.559 165.6 32 747 21337 0.0313 0.02679
OR 7.666 417.2 35 420 21772 0.01483 0.01804
WA 5.304 237.5 39 1863 63939 0.02136 0.02184
ID 3.107 218.6 42 314 25592 0.01096 0.01401
SC 7.052 233.3 46 2511 102130 0.02954 0.02934
ND 24.45 1581 53 137 7882 0.01023 0.01524
MT 10.09 538.4 54 91 5102 0.01489 0.01839
WV 4.347 177.2 55 179 7868 0.02268 0.02394
CA 4.714 272.3 59 12257 585179 0.01573 0.01702
NY 3.515 64.56 62 32468 422003 0.05083 0.05163
CO 5.973 202.1 63 1919 51420 0.02387 0.02871
LA 6.114 166.3 64 4623 132869 0.03514 0.03547
SD 13.16 786.4 65 161 9712 0.01231 0.01646
AL 3.5 151 67 2024 103849 0.02322 0.02265
FL 4.813 250.7 67 10397 542071 0.01846 0.01884
PA 5.06 102.8 67 7579 120279 0.04354 0.04692
WI 5.591 377.7 72 1081 61778 0.01575 0.01459
AR 3.522 184.4 75 696 48598 0.01912 0.01874
OK 5.915 301.9 77 730 44720 0.01843 0.01922
MS 6.684 179.5 82 2248 68293 0.03629 0.0359
MI 4.816 108.9 83 6588 93281 0.03779 0.04235
NE 4.889 272.3 86 382 28529 0.01613 0.01764
MN 3.433 178.5 87 1771 61703 0.01789 0.01887
OH 3.213 78.95 88 3986 102826 0.03786 0.03911
IN 3.453 81.26 92 3012 75854 0.03924 0.04075
TN 6.837 485.5 95 1560 118842 0.01339 0.01389
IA 4.185 183.7 99 1040 49045 0.02234 0.02227
NC 5.558 261.6 100 2535 137834 0.02043 0.02081
IL 5.525 188.6 102 7888 196862 0.0239 0.02845
KS 6.656 461.2 103 432 32010 0.01651 0.01423
MO 6.211 454.4 115 1426 60912 0.01222 0.01348
KY 4.271 168.7 120 885 35788 0.02286 0.02469
VA 4.065 152.7 133 2471 101737 0.02489 0.02592
GA 4.003 125.2 159 5041 203244 0.03155 0.03098
TX 5.325 177.3 250 11388 508463 0.0332 0.02915

Empirical density versus fitted distribution by state

The number in parentheses after each state indicates the number of counties.

Compare adjusted CFR from model to adjusted CFR computed using empirical Bayes with the state-specific beta prior

Plot adjusted CFR from model versus from EB

Adjusted CFRs from EB are generally lower than those from the model, where the two estimates differ. To understand how similar these two adjusted estimates are relative to the unadjusted CFR, we have to look at the unadjusted CFR as well.

Plot all three CFR for each county (ordered by model adjusted CFR)?

In most cases, the two adjusted CFRs are similar (relative to the unadjusted CFR). The Empirical Bayes adjustment tends to shrink the CFR to slightly lower values than the model adjustment.

One possible explanation for differences between the two adjustment methods is that in the model, there is a single variance for the county random effects across states, while in the EB method, the fitted beta distributions have differing variances by state.

Now plot the adjusted CFR from EB versus the adjusted CFR from the model state

Compare underreporting factors estimated using adjusted and unadjusted CRF

Assuming a true mortality rate of 0.0138. The distribution of estimated underreporting factors is much more heavy-tailed for the unadjusted CFRs compared to the adjusted CFRs.

Write out state priors for use in 19 and Me